Skip to content

UPSTREAM PR #20505: convert: Fix Qwen3.5/Qwen3.5 Moe NVFP4 Conversions#1267

Open
loci-dev wants to merge 2 commits intomainfrom
loci/pr-20505-nvfp4-fix-qwen-conversions
Open

UPSTREAM PR #20505: convert: Fix Qwen3.5/Qwen3.5 Moe NVFP4 Conversions#1267
loci-dev wants to merge 2 commits intomainfrom
loci/pr-20505-nvfp4-fix-qwen-conversions

Conversation

@loci-dev
Copy link
Copy Markdown

Note

Source pull request: ggml-org/llama.cpp#20505

This PR fixes several errors that occur when attempting to convert Qwen3.5/Qwen3.5Moe models. To keep this PR scope in check and specific, a separate PR ggml-org/llama.cpp#20506 allows loading of these newly converted models.

Bug:
When attempting to use convert_hf_to_gguf.py on various Qwen3.5 and Qwen3.5 MoE models, it would abort with the following error(s):

ValueError: Can not map tensor 'model.language_model.layers.0.mlp.shared_expert.down_proj.weight'
ValueError: Can not map tensor 'model.language_model.layers.0.linear_attn.in_proj_a.weight'

This occurred because these models now have model.language_model or language_model prefixes. The fix strips the wrappers instead of failing, which allows it to continue.
But just stripping the names and continuing was not enough to get the models converted properly, so it would cause a new error:

RuntimeError: shape '[16, 3, 1]' is invalid for input of size 1

This is because Qwen3.5's linear attention weights get reordered in modify_tensors():

# original order:  [q, k, v, z] * head_count
# corrected order: [q * head_count, k * head_count, v * head_count, z * head_count]

However NVFP4 bypasses modify_tensors() and has its own repacking, and linear_attn.in_proj_a.input_scale was seen by as a [num_v_heads] tensor and tried to reshape it into [16, 3, 1].
This is fixed by skipping tensors in the write loop that already were repacked

if self._is_nvfp4:
                if name.endswith(".weight") and name.replace(".weight", ".weight_scale") in self.model_tensors:
                    continue
                if name.endswith((".weight_scale", ".weight_scale_2", ".input_scale", "k_scale", ".v_scale"))
                    continue
 Updated: added k_scale and v_scale above

and by applying the same reordering for :

linear_attn.in_proj_qkv
linear_attn.in_proj_z
linear_attn.in_proj_a
linear_attn.in_proj_b
linear_attn.out_proj

This will now produce the correct Qwen3.5/Qwen3.5MoE NVFP4 GGUF file. A separate PR must be applied to load these files.
This fixed the issue with both Qwen3.5-122B-A10B-NVFP4 and Qwen3.5-27B-NVFP4 and correctly produced proper output.
Qwen3.5-35B-A3B-NVFP4.gguf was also tested after returning k_scale and v_scale to the skip list.

Note, some Qwen3.5 NVFP4 HF models produce this tokenizer error and others don't for the same model:

ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.

Workaround:
Edit the model's tokenizer_config.json and change tokenizer_class from TokenizersBackend to Qwen2Tokenizer

@loci-review
Copy link
Copy Markdown

loci-review bot commented Mar 18, 2026

No meaningful performance changes were detected across 120755 analyzed functions in the following binaries: build.bin.llama-tts, build.bin.libllama.so, build.bin.llama-cvector-generator, build.bin.libmtmd.so, build.bin.llama-bench, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.libggml.so, build.bin.llama-quantize, build.bin.llama-qwen2vl-cli, build.bin.llama-tokenize, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

@loci-dev loci-dev force-pushed the main branch 12 times, most recently from 8c39ead to 418d9f2 Compare March 26, 2026 02:17
@loci-dev loci-dev force-pushed the main branch 11 times, most recently from 1497621 to a67a372 Compare April 3, 2026 02:17
@loci-dev loci-dev force-pushed the main branch 3 times, most recently from 3655621 to fd3ce9d Compare April 6, 2026 02:18
@loci-dev loci-dev force-pushed the main branch 7 times, most recently from 55afbee to ef0eff4 Compare April 12, 2026 02:18
@loci-dev loci-dev force-pushed the main branch 7 times, most recently from 245e873 to d101579 Compare April 17, 2026 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants